Skip to content

Update BAWL to use reduced composition and allow for primitive structure reduction.#21

Open
msiron-entalpic wants to merge 1 commit intomainfrom
fix/bawl-primitive
Open

Update BAWL to use reduced composition and allow for primitive structure reduction.#21
msiron-entalpic wants to merge 1 commit intomainfrom
fix/bawl-primitive

Conversation

@msiron-entalpic
Copy link
Collaborator

@msiron-entalpic msiron-entalpic commented Apr 17, 2025

Problem: currently for some structures, the conventional cell returns a different hash than the primitive cell. For supercell, the full composition is used, so also a different hash is returned. (pointed out by @bkmi in #20 )

Solution: Ideally a supercell, or a conventional vs. primitive cell of a material should return the same hash. Propose update to BAWL to account for both reduced composition (which we initially did in LeMat-Bulk) and to account for transformation to primitive cell using Moyopy.

To do: would welcome benchmarks that transform materials to either primitive or conventional to test whether various hash function well against that. Or move to switch all hash to consider only the primitive transformation via Moyo?

In PR it is not reduced to primitive by default, it must be turned on:
bh = BAWLHasher(primitive_reduction=True)

@Ramlaoui
Copy link
Collaborator

Ramlaoui commented Apr 17, 2025

As suggested by @msiron-entalpic, we should enable this for any hasher or similarity matcher for fairness in the benchmark and enable them altogether when we want to reduce to the most primitive cell cf #22.

@bkmi
Copy link

bkmi commented Apr 17, 2025

There is small disagreement between moyo and spglib in 2% of the MP structures' space group. Is it obvious to you all what is the source of truth?
https://github.com/spglib/moyo/blob/main/bench/mp/analysis.ipynb

@bkmi
Copy link

bkmi commented Apr 17, 2025

The hasher worked trivially with

BAWLHasher(primitive_reduction=True).get_material_hash(s.make_supercell((2,1,1))) == BAWLHasher(primitive_reduction=True).get_material_hash(s)

@msiron-entalpic
Copy link
Collaborator Author

@bkmi spacegroup determination in Materials Project uses spglib under the hood as built into Pymatgen with a symmetry precision of 0.01 and an angle tolerance of 5 by default. By default, spglib recommends a symmetry precision of 1e-5 and a angle tolerance of -1. moyo is the newer library from the same group which does spglib. Either way, symmetry detection is not always unique and depends on precision parameters chosen, we are simply aiming for speed! Hope that helps!!

@bkmi
Copy link

bkmi commented Apr 21, 2025

Do you know the default parameters for moyo?

@bkmi
Copy link

bkmi commented Apr 21, 2025

btw these parameters are quite strict. I might suggest something more premissive, such as symmetry precision = 0.1 (or 0.01 like pymatgen). Maybe also the angle default from pymatgen.

@bkmi
Copy link

bkmi commented Apr 21, 2025

More experiments, unfortunately all is not well.

I am using a version where I allow for a higher tolerance for matches...
https://github.com/bkmi/lematerial-hasher/tree/fix/bawl-primitive

but this causes hanging behavior. I think it's due to the underlying moyo library.

from material_hasher.hasher.bawl import BAWLHasher
from pymatgen.io.cif import CifParser
from pymatgen.transformations.standard_transformations import PerturbStructureTransformation
from pymatgen.io.ase import AseAtomsAdaptor
from copy import deepcopy

parser = CifParser("the.cif")
s = parser.get_structures()[0]
sprc = deepcopy(s).make_supercell((2,1,1))

BAWLHasher(primitive_reduction=False).get_material_hash(sprc) == BAWLHasher(primitive_reduction=False).get_material_hash(s)
# false 👍 

BAWLHasher(primitive_reduction=True).get_material_hash(sprc) == BAWLHasher(primitive_reduction=True).get_material_hash(s)
# true 👍 

atoms = AseAtomsAdaptor.get_atoms(deepcopy(sprc))
atoms.rattle(0.01)
rsprc = AseAtomsAdaptor.get_structure(atoms)
BAWLHasher(primitive_reduction=True).get_material_hash(rsprc) == BAWLHasher(primitive_reduction=True).get_material_hash(s)
# this hangs 👎 

the.cif

# generated using pymatgen
data_Fe2O3
_symmetry_space_group_name_H-M   'P 1'
_cell_length_a   5.13463532
_cell_length_b   5.13463532
_cell_length_c   5.50540479
_cell_angle_alpha   62.20387762
_cell_angle_beta   62.20387762
_cell_angle_gamma   60.00000026
_symmetry_Int_Tables_number   1
_chemical_formula_structural   Fe2O3
_chemical_formula_sum   'Fe4 O6'
_cell_volume   105.92154660
_cell_formula_units_Z   2
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Fe  Fe0  1  0.85364782  0.85364782  0.43905654  1
  Fe  Fe1  1  0.35364782  0.35364782  0.93905654  1
  Fe  Fe2  1  0.14635218  0.14635218  0.56094346  1
  Fe  Fe3  1  0.64635218  0.64635218  0.06094346  1
  O  O4  1  0.44604828  0.05395172  0.75000000  1
  O  O5  1  0.05395172  0.75000000  0.75000000  1
  O  O6  1  0.75000000  0.44604828  0.75000000  1
  O  O7  1  0.94604828  0.25000000  0.25000000  1
  O  O8  1  0.25000000  0.55395172  0.25000000  1
  O  O9  1  0.55395172  0.94604828  0.25000000  1

I brought this up in spglib/moyo#89

@msiron-entalpic
Copy link
Collaborator Author

Do you know the default parameters for moyo?

I believe it is 1e-4 for symprec and for angle_tolerance it should be the same as spglib

@msiron-entalpic
Copy link
Collaborator Author

btw these parameters are quite strict. I might suggest something more premissive, such as symmetry precision = 0.1 (or 0.01 like pymatgen). Maybe also the angle default from pymatgen.

Unsure about the implications of this, but let me make a fix to be able to pass on custom tolerance parameters for BAWL!

@bkmi
Copy link

bkmi commented Apr 23, 2025

I did that in my fork. Sorry I didn't pull request it. I'll do it now.

@bkmi
Copy link

bkmi commented Apr 23, 2025

Done

Copy link
Collaborator

@Ramlaoui Ramlaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bkmi
Copy link

bkmi commented Apr 23, 2025

I used relatively permissive parameters. I think we should try to figure out what's the deal with the hanging before merging this (using the permissive parameters).

I think making it work with permissive parameters is important.

@msiron-entalpic
Copy link
Collaborator Author

I think making it work with permissive parameters is important.

Actually you're right, I was thinking of downstream effect since we use this code for labeling fingerprints for LeMat-Bulk, but currently we use Pymatgen with the less stringent symprec, so if we move to Moyo we should also use the less stringent parameters by default.

@bkmi
Copy link

bkmi commented May 8, 2025

It sounds like the hanging might have been caused by using degrees when I should have used radians.

I'm busy for a while but will follow up later. This tool would be extremely useful for us.

@bkmi
Copy link

bkmi commented Jun 11, 2025

I fixed it.

Specifically, I finished my pull request onto fix/bawl-primitive. Namely, this one #23. Please merge my fork into into fix/bawl-primitive and then merge this into main!

Thank you!!

@bkmi
Copy link

bkmi commented Jun 19, 2025

The hasher does not work reliably with moyo. I am investigating making pymatgen the default.

It is not recommended to use angle_tolerance in any case... according to this one random post (and the fact that using angle_tolerance lead to some cases hanging). spglib/spglib#567

--

p.s. it remains to be seen whether standard, primitive cells (used in spglib & moyo) are the same as niggli reduced primitive cells. pymatgen claims that standard, primitive are defined in the source

Setyawan, W., & Curtarolo, S. (2010). High-throughput electronic band structure calculations: Challenges and tools. Computational Materials Science, 49(2), 299-312. doi:10.1016/j.commatsci.2010.05.010.

@bkmi
Copy link

bkmi commented Jun 19, 2025

I cannot comment yet on the quality of the hash, but I'm getting much more stable behavior using my new implementation that uses the reduction and primitive functions from pymatgen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants